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Abstract 

Perfect knowledge of the underlying state transition probabilities is necessary for designing an optinnal intervention 
strategy for a given Markovian genetic regulatory network. However, in nnany practical situations, the connplex nature 
of the network and/or identification costs linn it the availability of such perfect knowledge. To address this difficulty, we 
propose to take a Bayesian approach and represent the system of interest as an uncertainty class of several models, 
each assigned some probability, which reflects our prior knowledge about the system. We define the objective 
function to be the expected cost relative to the probability distribution over the uncertainty class and formulate an 
optimal Bayesian robust intervention policy minimizing this cost function. The resulting policy may not be optimal for 
a fixed element within the uncertainty class, but it is optimal when averaged across the uncertainly class. Furthermore, 
starting from a prior probability distribution over the uncertainty class and collecting samples from the process over 
time, one can update the prior distribution to a posterior and find the corresponding optimal Bayesian robust policy 
relative to the posterior distribution. Therefore, the optimal intervention policy is essentially nonstationary and 
adaptive. 

Keywords: Optimal intervention; Markovian gene regulatory networks; Probabilistic Boolean networks; Uncertainty; 
Prior knowledge; Bayesian control 



Introduction 

A fundamental problem of translational genomics is to 
develop optimal therapeutic methods in the context of 
genetic regulatory networks (GRNs) [1]. Most previous 
studies rely on perfect knowledge regarding the state 
transition rules of the network; however, when dealing 
with biological systems such as cancer cells, owing to 
their intrinsic complexity, little is known about how they 
respond to various stimuli or how they function under 
certain conditions. Moreover, if there exists any knowl- 
edge regarding their functioning, it is usually marginal 
and insufficient to provide a perfect understanding of the 
full system. To address uncertainty, one can construct an 
uncertainty class of models, each representing the system 
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of interest to some extent, and optimize an objective func- 
tion across the entire uncertainty class. In this way, success 
in therapeutic applications is fundamentally bound to the 
degree of robustness of the designed intervention method. 

Markovian dynamical networks, especially probabilistic 
Boolean networks (PBNs) [2], have been the main frame- 
work in which to study intervention methods due to their 
ability to model randomness that is intrinsic to the inter- 
actions among genes or gene products. The stochastic 
state transition rules of any PEN can be characterized 
by a corresponding Markov chain with known transition 
probability matrix (TPM) [3]. Markov decision processes 
(MDPs), on the other hand, are a standard framework 
for characterizing optimal intervention strategies. Many 
GRN optimization problems have been formulated in the 
context of MDPs - for instance - infinite-horizon control 
[4], constrained intervention [5], optimal intervention in 
asynchronous GRNs [6], optimal intervention when there 
are random-length responses to drug intervention [7], and 
optimal intervention to achieve the maximal beneficial 
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shift in the steady-state distribution [8]. Herein, PBNs will 
be our choice of reference model for GRNs. 

The first efforts to address robustness in the design 
of intervention policies for PBNs assumed that the 
errors made during data extraction, discretization, gene 
selection and network generation introduce a mis- 
match between the PEN model and the actual GRN 
[9,10]. Therefore, uncertainties manifest themselves in the 
entries of the TPM. A minimax approach was taken in 
which robust intervention policies were formulated by 
minimizing the worst-case performance across the uncer- 
tainty class [9]. Thus, the resulting policies were typically 
conservative. To avoid the detrimental effects of extreme, 
but rare, states on minimax design and motivated by the 
results of Bayesian robust filter design [11], the authors 
in [10] adopted a Bayesian approach whereby the opti- 
mal intervention policy depends on the prior probability 
distribution over the uncertainty class of networks. Con- 
structing a collection of optimal policies, each being opti- 
mal for a member of the uncertainty class, the goal was to 
pick a single policy from this collection that minimizes the 
average performance relative to the prior distribution. The 
corresponding policy provides a model-constrained robust 
(MCR) policy. It was noted that this model-constrained 
policy may not yield the best average performance among 
all possible policies (we will later define the set of all possi- 
ble policies for this problem). The authors also considered 
a class oi globally robust (GR) policies, which are designed 
optimally only for a centrality parameter, such as the 
mean or median, to represent the mass of the uncertainty 
distribution. 

Since [10] was concerned only with stationary poli- 
cies, it did not consider the possibility of finding non- 
stationary policies under a Bayesian updating framework, 
where state transitions observed from the system are 
used directly to enrich the prior knowledge regarding the 
uncertainty class. The resulting nonstationary interven- 
tion policy, which we refer to it as the optimal Bayesian 
robust (OBR) policy, is our main interest in the present 
paper. As our main optimization criterion, we use the 
expected total discounted cost in the long run. This choice 
is motivated by the practical implications of discounted 
cost in the context of medical treatment, where the dis- 
counting factor emphasizes that obtaining good treatment 
outcomes at an earlier stage is favored over later stages. 

Since the early development of MDPs, it was recognized 
that when dealing with a real-world problem it seldom 
happens that the decision maker is provided with the full 
knowledge of the TPM, but rather some prior informa- 
tion often expressed in a probabilistic manner. Taking a 
Bayesian approach, an optimal control policy may exist in 
the expected value sense specifying the best choice of con- 
trol action in each state. Since the decision maker's state 
of knowledge about the underlying true process evolves 



in time as the process continues, the best choice of con- 
trol action at each state might also evolve. Because the 
observations are acquired through a controlled process (a 
control action is taken at every stage of the process), the 
optimal policy derived through the Bayesian framework 
may not necessarily ever coincide with a policy that is 
optimal for the true state of nature. In fact, frequently, the 
optimal policy is not self-optimizing [12]; rather, optimal 
control will provide the best trade-off between exploration 
rewards and immediate costs. 

Bellman [13] considered a special case of this problem - 
the two-armed bandit problem with discounted cost - and 
later used the term adaptive control for control processes 
with incompletely known transition probabilities. He 
suggested transforming the problem into an equivalent 
dynamic program with completely known transition laws 
for which the state now constitutes both the physical state 
of the process and an information state summarizing the 
past history of the observed state transitions from the pro- 
cess [14]. This new state is referred to as the hyper state. 
Along this line of research, authors in [15-17] developed 
the theory of the OBR policy for Markov chains with 
uncertainty in their transition probabilities, where there is 
a clear notion of optimality defined with respect to all pos- 
sible scenarios within the uncertainty class. This approach 
is in contrast with the MCR methodology because the 
resulting policy may not be optimal for any member of the 
uncertainty class but it yields the best performance when 
averaged over the entire uncertainty class. 

Following the methodology proposed in [17] and 
assuming that the prior probability distribution of a ran- 
dom TPM belongs to a conjugate family of distributions 
which are closed under consecutive observations, one 
can formulate a set of functional equations, similar to 
those of fully known controlled Markov chains, and use 
a method of successive approximation to find the unique 
set of solutions to these equations. In this paper, we adopt 
this approach for the robust intervention of Markovian 
GRNs and provide a simulation study demonstrating the 
performance of OBR policies compared with several sub- 
optimal methods, such as MCR and two variations of 
GR policies, when applied to synthetic PBNs with vari- 
ous structural properties and parameters, as well as to a 
mutated mammalian cell cycle network. 

The paper is organized as follows. First, we give an 
overview of controlled PBNs and review the nominal 
MDP problem where the TPMs of the underlying Markov 
chain are completely known. We then formulate the OBR 
policy for PBNs with uncertainty in their TPMs and 
provide the dynamic programming solution to this opti- 
mization problem. We demonstrate a conjugate family of 
probability distributions over the uncertainty class where 
each row of the random TPM follows a Dirichlet distribu- 
tion with certain parameters. Assuming that the rows are 
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independent, the posterior probability distribution will 
again be a Dirichlet distribution with updated parameters. 
This provides a compact representation of the dynamic 
programming equation and facilitates the computations 
involved in the optimization problem. Several related sub- 
optimal policies are also discussed in detail. Finally, we 
provide simulation results over both synthetic and real 
networks, comparing the performance of different design 
strategies discussed in this paper. 

Methods 

Controlled PBNs 

PBNs constitute a broad class of stochastic models for 
transcriptional regulatory networks. Their construction 
takes into account several random factors, including 
effects of latent variables, involved in the dynamical 
genetic regulation [3]. The backbone of every PBN is laid 
upon a collection of Boolean networks (BNs) [18]. A BN 
is composed of a set of n nodes, V = {v^, v^, . . . , v^} 
(representing expression level of genes g^>g^>. . .>g^ or 
their products) and a list of Boolean functions F = 
{f^ff^f . . . ff^} describing the functional relationships 
between the nodes. We restrict ourselves to binary BNs, 
where we assume that each node takes on value of 0, 
corresponding to an unexpressed (OFF) gene and 1, cor- 
responding to an expressed (ON) gene. This definition 
extends directly to any finitely discrete- valued nodes. The 
Boolean function/^ : {0, ly^ {0, 1} determines the 
value of node / at time /c + 1 given the value of its pre- 
dictor nodes at time k by v^^^ = /^(v^^, v^^, . . . , v^'), 
where {v^^, v^"^, . . . , v^^'} is the predictor set of node v^ In 
a BN, all nodes are assumed to update their values syn- 
chronously according to F. The dynamics of a BN are 
completely determined by its state transition diagram 
composed of 2^ states. Each state corresponds to a vec- 
tor Vy^ = (v^, • • • > known as the gene activity profile 
(GAP) of the BN at time k. To make our analysis more 
straightforward, we will replace each GAP, v/^, with its dec- 
imal equivalent denoted by Xk = 1 + Yll=i where 
XkeS = {h...,2'']iov alU. 

A PBN is fully characterized by the same set of n nodes, 
V, and a set of m constituent BNs, F = . . . 

called contexts, a selection probability vector R = 
{r^, r^, . . . , r^} over F (r^ > 0 for / = 1, and 
= 1)> a network switching probability q > 0, 
and a random gene perturbation probability p > 0. At 
any updating epoch, depending on the value of a random 
variable § g {0, 1}, with = I) = one of two 
mutually exclusive events will occur. If ^ = 0 then the val- 
ues of all nodes are updated synchronously according to 
an operative constituent BN; if § = 1 then another oper- 
ative BN, F^ e F, is randomly selected with probability 
and the values of the nodes are updated accordingly. The 



current BN may be selected consecutively when a switch 
is called for [1]. PBNs also admit random gene perturba- 
tions where the current state of each node in the network 
can be randomly flipped with probability p, 

A PBN is said to be context-sensitive iiq < 1; otherwise, 
a PBN is called instantaneously random. The number of 
states in a context-sensitive PBN is m2^, whereas the state 
transition diagram of an instantaneously random PBN is 
composed of the same 2^ states in S. It is shown in [19] 
that averaging over the various contexts, relative to R, 
reduces the transition probabilities of a context-sensitive 
PBN to an instantaneously random PBN with identical 
parameters. PBNs with only one constituent BN, i.e., m = 
1, are called BNs with perturbation and are of particular 
interest in some applications [8,20]. For the sake of sim- 
plicity and reducing the computational time, we will focus 
only on instantaneously random PBNs. 

Since the nature of transitions from one state to another 
in a PBN is stochastic and has the Markov property, 
we can model any PBN by an equivalent homogeneous 
Markov chain, whose states are members of S and the 
TPM of this Markov chain can be calculated as described 
in [19]. We denote the TPM of an instantaneously ran- 
dom PBN by V and let {Z^ G S,k = 0, 1, . . .} be the 
stochastic process of the state transitions for this PBN. 
Originating from state i e S, the successor state j e S 
is selected randomly according to the transition proba- 
bility Vij = P(Z/^+i = j I Z/^ = i), the (/,;) element of the 
TPM. For every / G 5, the transition probability vector 
CPii, Vi2> • • • > Vi\s\) is a stochastic vector such that Vij > 0 
and J2jeS ^ij — ^ every / G 5. Random gene pertur- 
bation guarantees the ergodicity of the equivalent Markov 
chain, resulting in a unique invariant measure equal to its 
limiting distribution. 

To model the effect of interventions, we assume that 
PBNs admit an external control input. A, from a set of 
possible inputs signals, A that determines a specific type 
of intervention on a set of control genes. It is common to 
assume that the control input is binary, i.e., A = {0, 1}, 
where A = 0 indicates no-intervention and A = I indi- 
cates that the expression level of a single control gene, g^ 
(or equivalently v^), for a given c G {1, 2, . . . , w}, should be 
flipped. For this control scheme, A = 0 does not alter the 
TPM of the original uncontrolled PBN. However, assum- 
ing that the network is in state the action A = 1 replaces 
the row corresponding to this state by the row that corre- 
sponds to the state /, where the binary representation of / 
is the same as / except v^ being flipped. The effect of this 
binary control scheme on any PBN can be easily general- 
ized to more than one control gene with more than two 
control actions; in this paper, we only consider the binary 
control scheme. 

Let {(Zk,Aj,),Zk G S.Aj, e Ak = 0,1,.. .} denote the 
stochastic process of a state-action pair. The law of motion 
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for the controlled network, with binary external control, is 
represented by a matrix V(a) with its (/,/) element defined 



for the controlled network [12]. In the nominal optimiza- 
tion problem, we desire an intervention policy fz e Ai 
such that the objective function 



(N-1 



Vij(a) = P(Z/,+i =j \ Zi, = UA], = a) 



(1) 



Vij{d) is the probability of going to state ; g 5 at time /c+ 1 
from state / g S, while taking action a ^ A, dX time k. 
By this construction, it is clear that the controlled TPM, 
V{a)y can be calculated directly from V. 

The nominal problem 

External intervention in the context of Markovian net- 
works refers to a class of sequential decision making 
problems in which actions are taken at discrete time units 
to alter the dynamics of the underlying GRN. It is usu- 
ally assumed that the decision maker can observe the 
state evolution of the network at consecutive time epochs 
k = 0, 1, . . . , A/', where the horizon N may be finite or 
infinite. At each /c, upon observing the state, the decision 
maker chooses an action from A that will subsequently 
alter the dynamics of the network. Hence, the stochastic 
movement of the GRN from one state to another is com- 
pletely characterized based on the current state and action 
taken at this state by (1). 

Associated with each state and action, there is an imme- 
diate cost function^ :<Sx^x5^Mtobe accrued until 
the next decision epoch, which we assume is nonnegative 
and bounded. This cost may reflect the degree of desir- 
ability of different states and/or the cost of intervention 
that is applied. Whenever the process moves from state / 
to ; under action a known cost gij{a) is incurred. We 
also assume that k e (0, 1) is the discount factor reflect- 
ing the present value of the future cost. An intervention 
policy, denoted by /x, is a prescription for taking actions 
from the set A at each point k in time. In general, one 
can allow a policy for taking an action at time k to be 
a mapping from the entire history of the process up to 
time k to the action space. This mapping need not be 
deterministic; on the contrary, it might involve a random 
mechanism that is a function of the history. However, for 
the problem we consider, there exists a deterministic pol- 
icy that is optimal. We denote the set of all admissible 
policies by M. The TPM initial state Zq = /, and any 
given policy /x = {/xq, /xi, . . .} in determine a unique 
probability measure, , over the space of all trajectories 
of states and actions, which correspondingly defines the 
stochastic processes Z^ and of the states and actions 



7^(0= lim Et\Y,>J'gz,z,^Mk) 



k=0 



(2) 



is minimized, i.e., /|>(0 = min^^A^/pCO for i ^ In 
the above equation, denotes expectation relative to the 
probability measure Pf, 

This optimization problem is usually solved by formu- 
lating a set of simultaneous functional equations and a 
mapping T/ : <S ^ M, obtained by applying the dynamic 
programming mapping to any function / : 5 ^ M, for all 
/ G S defined by 



(TJ)(i) = min 

aeA 



jeS 



jeS 



(3) 



The optimal cost function /* uniquely satisfies the above 
functional equation, i.e., it is the fixed point of the map- 
ping T. One can determine the optimal policy with the 
help of convergence, optimality, and uniqueness theorems 
for the solution, proven in [21]. These results furnish 
an iterative method for successive approximation of the 
optimal cost function, which in turn gives the optimal 
intervention policy. It can be further shown that the opti- 
mal intervention policy belongs to the class of stationary 
deterministic policies, meaning that /x/^ = /x for all k and 
jji \ S ^ A is di single-valued mapping from states to 
actions. 

OBR intervention policy 

In many real- world intervention scenarios, perfect knowl- 
edge regarding V may be unavailable or very expensive to 
acquire. Therefore, we resort to a probabilistic character- 
ization of the elements of V and optimize relative to this 
uncertainty. Our results in this section are mainly derived 
from the Bayesian treatment of MDPs by [17]. Let 



Q = [V :Vis\S\^ \SlVij>0, 



Y^Vij = liov alhjGcS 

jeS 



(4) 



denote the set of all valid uncontrolled TPMs. The uncer- 
tainty about the random matrix V is characterized by the 
prior probability density ttCP) over the set Given n{V) 
and some initial state /, we define 



/^(/,7r) = lim £f 



N^oo 



IN- 

lk=o 



'gZj,Zj,+Mk) 



(5) 
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where the expectation is taken not only with respect to the 
random behavior of the state-action stochastic process but 
also with respect to the random choice of V according to 
its prior distribution, 7t(V), The goal is to find an optimal 
policy /X* such that (5) is minimized for any / e S and any 
prior distribution tt, i.e., /x* = argmin^^^/^(/, tt). We 
denote the optimal cost by /*(/, tt). 

Suppose that we could find optimal intervention policies 
for every element of Q. Letting J^(i) denote the optimal 
cost for any V e Q and / e S and assuming that the opti- 
mal cost /*(/, tt) exists, we have Ej^; [/|,(0] < ^) for all 
/ e S and any jr. In other words, Ej^ [/|>(0] is the best that 
could be achieved if we were to optimize for every element 
of the uncertainty class for fixed / and tt. 

Since at every stage of the problem an observation is 
made immediately after taking an action, we can utilize 
this additional information and update the prior distribu- 
tion to a posterior distribution as the process proceeds 
in time. Therefore, we can treat 7t(V) as an additional 
state and call tt) the hyperstate of the process. From 
this point of view, we seek an intervention policy that 
minimizes the total expected discounted cost when the 
process starts from a hyperstate (/, tt). Suppose the true, 
but unknown, TPM is P. At time 0, the initial state zq 
is known and V is distributed according to tt. Based on 
Zq and TT, the controller chooses an action ao according 
to some intervention policy. Based on (zq^ clq, V) the new 
state zi is realized according to the probability transition 
rule VzQZi (^o) and a cost gzQZi (<^o) is incurred. Based on 
{zQ,7t ,aQ,zi), the controller chooses an action ai accord- 
ing to some (possibly another) intervention policy and 
so on [12]. Although the number of states in S and 
actions in A are finite, the space of all possible hyperstates 
is essentially uncountable. Therefore, finding an optimal 
intervention policy which provides a mapping from the 
space of hyperstates to the space of actions in a sense sim- 
ilar to the nominal case is rather difficult. However, as we 
will see, it is possible to find an optimal action for a fixed 
initial hyperstate using an equivalent dynamic program. 

Dynamic programming solution 

We assume that the rows of V are mutually indepen- 
dent. Note that this assumption might not hold true for 
a large class of problems; however, the analysis becomes 
overwhelmingly complicated if one is willing to relax this 
assumption. The posterior probability density of when 
the process moves from state / to state / under control a, 
is found via Bayes' rule: 



cPij7T{V), if ^ = 0, 



(6) 



where c and are normalizing constants depending on 
and /. Under the sequence of events described above. 



Martin [17] showed that the minimum expected dis- 
counted cost over an infinite period, N = oo, exists and 
formulated an equivalent dynamic program with a set 
of simultaneous functional equations. The dynamic pro- 
gramming operator T, similar to (3) but now with the 
hyperstate (/, tt), takes the following form: 



(77)(/,7r(P)) = min 

aeA 



jeS 



jeS 



(7) 



for all i G <S, where Vij{a) = E[Vij{a)] with respect to the 
prior probability density function tt. It is shown in [17] 
that there exists a unique bounded set of optimal costs /* 
satisfying 



/*(/, TT CP)) = min 
aeA 



JeS 



jeS 

which is the fixed point of the operator T. Since the space 
of all possible hyperstates (/, tt) is uncountable, construc- 
tion of an optimal intervention policy for all (/, tt), except 
for some special cases, may not be feasible. However, 
given that the process starts at (/, tt), the minimization 
argument in the above equation yields an optimal action 
to take only for the current hyperstate. 

The difficulty in solving (7), which makes it more com- 
plicated than (3), is that the total expected discounted 
cost when different actions are taken now involves the 
difference in expected immediate costs and the expected 
difference in future costs due to being in different states at 
the next period as well as the effect of different informa- 
tion states resulting from these actions [22]. It should be 
noted that since the decision maker s knowledge regard- 
ing the uncertainty about V evolves with each transition, 
the intervention policy will also evolve over time. In a 
sense, the optimal policy will adapt, implying that station- 
ary optimal policies as defined for the nominal problem do 
not exist. The optimal nonstationary intervention policy 
derived through the process discussed above is referred to 
as the OBR policy. 

Special case: independent Dirichlet priors 

Suppose that both prior and posterior distributions 
belong to the same family of distributions, i.e., they are 
conjugate distributions. Then, instead of dealing with 
prior and posterior at every stage of the problem, we will 
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only need to keep track of the hyper parameters of the 
prior/posterior distributions. A special case of the families 
of distributions closed under consecutive observations is 
the Dirichlet distribution, which is the conjugate prior of 
the multinomial distribution. 

Let the initial state zq be known and = 
{zq),Zi,Z2, . . . ,Zn) represent a sample path of n indepen- 
dent transitions recorded from the network under the 
influence of an intervention policy. Then the posterior 
probability density of n\V), can be found using Bayes' 
rule: 



and 



ieS jeS 



(8) 



where ^ij denotes the number of transitions in Xyi from 
state / to state y. The right product in (8) is called the like- 
lihood function and the constant of proportionality can 
be found by normalizing the integral of 7t\V) over Q to 
1. Note that although the transitions made in result 
from an intervention policy, we have formulated the like- 
lihood function only in terms of the elements of V (and 
not V{a)), This is a consequence of our particular inter- 
vention model, where we can substitute for Vij(a) with 
whenever a = 1 2iS shown in (1). To be more precise, we 
have Pij = Pij(0) + where Pij(a) is the number of 

transitions in z^ from state / to state / under control a. 

For a fixed state a transition to state / is an out- 
come of a multinomial sampling distribution with param- 
eters {Vii, Vi2, " Vi\s\} constituting the standard (\S\ — 
1) -simplex. As stated in the beginning of this section, 
the conjugate prior for the multinomial distribution is 
given by the Dirichlet distribution. By the independence 
assumption imposed on the rows of P, one can write the 
prior for V as 



7t(V) = c(a)Y\Y\(VijrJ-\ 



(9) 



ieS eS 



where aij > 0 and a = [aij] is the hyperparameter matrix 
with the rows arranged in the same manner as 'P. The 
constant of proportionality is given by 



c(a) = Y\ 



ieS 



nr(«,;) ' 



(10) 



where F is the gamma function. The uniform prior dis- 
tribution is obtained if aij = 1 for all e S, As we 
increase a specific of//, it is as if we bias the posterior dis- 
tribution on the corresponding element of V with some 
transition samples before ever observing any samples. It 
can be verified that 



E[Vij] = 



leS 



(11) 



var[P^] = 



Vijil - Vij) 

E oiii + 1 * 
leS 



We also have the following theorem, which is due to 
Martin [17]. 

Theorem 1. Let V have a probability density function 
given in (9) and (10) with the hyperparameter matrix a 
and suppose that a sample with a transition count matrix 
P — [Pij] is observed, Thentheposterior probability density 
function ofV will have the same form as in (9) and (10), 
but with the hyperparameter matrix a -\- p. 

Assuming a as the hyperparameter representing 7t(V) 
and using Theorem 1, one can rewrite Equation 7 as 



(T7)(/,Qf) = min 

aeA 



jeS 



+xJ2^ij(a)J(j,a + y) 

jeS 

where y is a matrix of all zeros except yij = 1 if = 0 or 
Yj. = liia = 1, and 



Vij, if a = 0, 
Vy, ifa = h 



The optimal cost /* (/, a) is defined by 

= min /^(/,Qf), 

for a given / e S and prior hyperparameter a. Taking an 
approach based on the method of successive approxima- 
tion, let a) for /c = 0, 1, ... be defined recursively for 
all / e S and any valid hyperparameter matrix a by 



/^+l(/,Qf) 



mm 

aeA 



^Vij(a)gij(a) 



jeS 



jeS 



(12) 



with {}o(ifOt)} as a set of bounded initial functions. Under 
some mild conditions, the sequence of functions {A(/, a)} 
converges monotonically to the optimal solution 
for any / G S and uniformly for all valid a [17]. Faster 
rates of convergence can be achieved for smaller values 
of A. Assuming that the method of successive approxi- 
mation converges in K steps, then for a specific value of 
(/,Qf), one needs to evaluate (|^| x \S\y^ terminal val- 
ues necessary for the computation of Pii^a). Therefore, 
to minimize computational time, we restrict ourselves to 
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small values for X and K, Once the successive approxima- 
tion converges, an action a* that minimizes the RHS of 
(12) is optimal. 

The intervention policy optimally adapts to the con- 
secutive observations as follows: we start with an initial 
hyperstate (zoy oio)y with ao reflecting our prior knowledge 
regarding the unknown network (or equivalently V). We 
can calculate V using (11) with respect to ao and utilize 
the successive approximation method in (12) for a fixed 
K to find an optimal action a"^. We then apply the action 
a"^ to the network and let it transition from state Zo, or zq 
depending on the optimal action, to a new random state 
Zi according to V, We incorporate the new observation 
into our prior knowledge and update the hyperparameter 
matrix to ai by incrementing the entry at (zq^ Zi) or (5o, Zi) 
of the hyperparameter matrix ofo by 1. We repeat the entire 
optimization procedure, but now with the new hyperstate 
(;^i,Qfi), etc. A schematic diagram of this procedure is 
demonstrated in Figure 1. 

The extreme computational complexity of finding the 
OBR intervention policy for MDPs with large state-space 
poses a major obstacle when dealing with real- world prob- 
lems. It is relatively straightforward to implement the pro- 
cedure described above for networks with three or four 
genes. However, for larger networks, one should resort 
either to clever ways of indexing all possible transitions, 
such as hash tables or a branch-and-bound algorithm, 
or to approximation methods, such as reinforcement 
learning. See [12,22,23] for more details. An alternative 
approach, as we will demonstrate, is to implement sub- 
optimal methods that, in general, have acceptable per- 
formance. Yet another potential approach to circumvent 
the explosion of the space of all hyperstates is to reduce 
the size of the uncertainty class. For example, we can 
assume that some rows of the underlying TPM are per- 
fectly known and uncertainty is only on some other rows, 
with the implication that the regulatory network is par- 
tially known. We will leave the analysis of such approaches 
to future research. 

Suboptimal intervention policies 

Besides the OBR policy, three suboptimal policies are of 
particular interest: MCR, GR, and adaptive GR (AGR). 
Similar to the previous section, let V be random, having 
a probability density 7t(V) over the set of valid TPMs, 
defined in (4). 

Let M-MCR denote the set of all policies that are optimal 
for some element V e Each policy in A^mcr is 
stationary and deterministic (each corresponds to a 
problem with known TPM). Because Q is uncountable 
and there exits a finite number of stationary deterministic 
policies, one might find policies that are optimal for 
many elements of Q. Assuming that the initial state Zq 
is randomly distributed according to some probability 



distribution rf, the policy /xmcr yields the minimum cost, 
which is defined by 

/mcr(0= min £^[£,[/^(Zo)]], (13) 

where (Zq) is defined in (2) for any fixed Zq and V. Since 
we are limiting ourselves to policies in A4mcr> it is sel- 
dom the case that a single policy minimizes Ejt [/^ (Zq)] for 
all Zo G S. Hence, we take the expected value of (Zq) 
with respect to in (13) as a single value representing the 
expected cost. The resulting MCR intervention policy is 
therefore fixed for a given prior distribution in the sense 
that it will not adapt to the observed transitions. 

We define the GR policy as the minimizing argu- 
ment for the optimization problem given by /gr(0 = 
min^ex/^(0, for all / e 5, where V e Q is the mean 
of the uncertainty class Q with respect to the prior dis- 
tribution 7t, The optimization method presented for the 
nominal problem can be readily applied. Hence, the result- 
ing policy, /xgr, is stationary and deterministic. In the case 
of independent Dirichlet priors, V is given by Equation 11. 
Here we are considering the mean as an estimate for 
unknown V. However, one can use any other estimate of 
V and find the optimal policy in a similar fashion. Similar 
to the MCR policy, this intervention method is also fixed 
for a given prior distribution and it will not adapt to the 
observed transitions. 

The AGR policy is similar to the GR policy in the sense 
that it is optimal for the mean of the uncertainty class 
Q. However, instead of taking the mean with respect to 
the prior distribution tt and using the same policy for the 
entire process, we update tt to a posterior tt^ defined in 
(6), whenever a transition is made and calculate the mean 
of Q with respect to 7t\ Since the posterior evolves as 
we observe more and more transitions, the AGR policy 
also evolves - therefore, the name adaptive. We denote the 
cost and the corresponding policy resulting from this pro- 
cedure, for any initial hyperstate (/, tt), by /agr(^>^) and 
Magr> respectively. In the case of independent Dirichlet 
priors, we can simply replace tt with a. 

Results 

In this section, we provide a comparison study on the 
performance of optimal and suboptimal policies based on 
simulations on synthetically generated PBNs and a real 
network. Since we implement the method of successive 
approximation to calculate /xobr> we restrict ourselves to 
synthetic networks with n = 3 genes. Given that, as we 
will show, /xagr yields very similar performance com- 
pared to the optimal policy, we can implement fiAGR for 
networks of larger size and use it as the baseline for com- 
parison with other suboptimal policies, keeping in mind 
that the optimal policy should and will outperform any 
suboptimal method. 
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Synthetic networks 

We first consider randomly generated PBNs with n = 3 
genes and m = 3 equally likely constituent BNs (total 
number of states being 8) with the maximum number 
of predictors for each node set to 2 (// < 2 for all / G 
{1, 2, ... , n}). The bias of a randomly generated PBN is the 
probability that each of its Boolean regulatory functions 
takes on the value 1 in its truth table. We assume that 
the bias is taken randomly from a beta distribution with 
mean 0.5 and standard deviation 0.01. The gene perturba- 
tion probability is assumed tohep = 0.001. In the context 
of gene regulation, there are some genes associated with 
phenotypes (typically undesirable ones). We refer to these 
genes as target genes and our goal in controlling the net- 
work is to push the dynamics of these genes away from 
undesirable states towards desirable ones. Once the set of 
target genes is identified, one can partition the state space 
S into subsets of desirable and undesirable states, denoted 
by V and U, respectively. In our synthetic network simu- 
lations, we choose the control and target genes to be the 
least and most significant bits in the binary representation 
of states, respectively, and assume that downregulation of 
the target gene is undesirable. As for the discount factor 
and the immediate cost i\xnctiongij{a), we set k = 0.2 and 



2.1, ii j eU and a = h 

2.0, if J eU and a = 0, 

0.1, ifJeV and a = h 

0, otherwise. 



(14) 



the interpretation being that a cost will be incurred if the 
future state in undesirable or there is an intervention in 
the network. 

To design an OBR policy for a given network, we need 
to assign the prior probability distribution to the set Q, As 
discussed earlier, independent Dirichlet priors parameter- 
ized by a constitute a natural choice for this application. 
Therefore, we only need to assign values to a. The choice 
of prior hyperparameters plays a crucial role in the design 
of an optimal policy: the tighter the prior around the true, 
but unknown, TPM V, the closer the OBR cost is to that 



of V. Since our synthetic networks are generated ran- 
domly and not according to some biologically motivated 
GRN, it would be difficult to assign prior probabilities for 
individual networks. Therefore, we use the randomly gen- 
erated PBNs themselves for this purpose and perturb and 
scale the elements of the TPMs via the £ -contamination 
method. 

A random PBN, V, is first generated. This network will 
serve as the true, but unknown, PBN. Then a contamina- 
tion matrix Q of the same size (|5| x is generated, 
where each row is sampled uniformly from the \S — 1|- 
simplex. Note that Q is a valid TPM. We now define the 
hyperparameter matrix a by 



(15) 



where k > 0 controls the tightness of the prior around the 
true PBN and s g[0, 1] controls the level of contamina- 
tion. For networks with three genes, we assume that s = 
0.1 and demonstrate the effect of k on the performance of 
intervention policies. 

We generate 500 random PBNs, denoted by {A/"^} for 
/ = 1 to 500, for each set of parameters and calculate 
their TPMs, denoted by {V^}. These networks will serve 
as the ground-truth for our simulation study. For a given 
pair of K and e, we then construct hyperparameter matri- 
ces, denoted by {a^}, using (15), each corresponding to 
a random network. To compare the performance of dif- 
ferent intervention policies, for each randomly generated 
network A/'^ we take a Monte Carlo approach and gener- 
ate 500 random TPMs, denoted by {V^'^'} for = 1 to 500, 
from the a^-parameterized independent Dirichlet priors. 
The set {V^'^ } will essentially represent Q and the prior 
distribution. 

To design and evaluate the performance of /xmcr for 
each random PBN A/'^ we proceed as follows: We find 
the optimal intervention policy for each V^'^ , apply this 
policy to every element in the set {^^'^ }, and calculate 
the average over all equally likely initial states, Zq g 5, 
of the infinite-horizon expected discounted cost using (2) 
for that element. The expected performance of the each 



^n+i,<^n+i Optimization 
: : Observation 

Start with 

nil 

Update to posterior 

Figure 1 Optimization procedure for an OBR policy. We start with a hyperstate (z^, an). We calculate V using an and utilize the successive 
approximation method for a fixed Kto find an optimal action o*. We then apply the action o* to the network and let it transition from state z^, orz^ 
depending on the optimal action, to a new random state Zn+] according to V. We incorporate the new observation into our prior knowledge and 
update the hyperparameter matrix to an+] by incrementing the entry at (Zn,Zn+i ) or (Zn,Zn+i ) of the hyperparameter matrix 0;^ by 1 . We repeat the 
entire optimization procedure, but now with the new hyperstate (Zn+i , an+] ). 
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policy optimal for V^'^' , relative to the prior distribution, 
can be computed by taking the average of the resulting 
costs over all V^'^ . We repeat this procedure for every ele- 
ment of {V^'^ } and declare a policy MCR if it yields the 
minimum expected performance. We denote the expected 
cost function for a random PBN J\f^ obtained via an MCR 
policy by /[jcR. 

Finding /xgr for each PBN J\f^, on the other hand, is eas- 
ier and it requires only the value of the hyperparameter 
Once found, the performance of this poUcy is evaluated 
by applying it to all elements of {V^'^ } and taking the aver- 
age of the resulting costs. Similar to the MCR policy, we 
assume that the initial states are equally likely and calcu- 
late the average over all possible initial states. We denote 
the expected cost function corresponding to the GR policy 
derived for J\f^ by /^j^. 

To quantify the performance of the OBR policy for 
each random PBN J\f^, we directly evaluate the cost func- 
tion defined in (5) relative to the independent Dirichlet 
prior distribution, tt^, parameterized by a^. This is accom- 
plished using the sample set of 500 random TPMs, {V^'^ }. 
Starting from a hyperstate and a TPM V^'^\ we derive 
an optimal action from (12) using the method of succes- 
sive approximations with K = 5 and some initial cost 
function. We then observe a transition according to V^'^ 
and find the incurred discounted immediate cost accord- 
ing to (14), depending on the new observed state and the 
optimal action just taken. We update our prior hyperpa- 
rameter and carry out the optimization problem again, but 
now with the updated hyperparameter and the recently 
observed state, and accumulate the newly incurred dis- 
counted immediate cost. We iterate this for seven epochs, 
thus observing seven different hyperstates for a sampling 
path, and record the total accumulated discounted cost 
over this period. We then repeat this entire process, for 
the same V^'^ for 100 iterations (although the same TPM 
is used, different sampling paths will result due to random 
transitions), and take the average of all 100 total accumu- 
lated discounted cost values. This will represent the cost 
associated with V^'^ and the initial state. We implement a 
similar procedure for all initial states (assuming all equally 
likely) and all elements of {P^'^ } and take the average of 
the resulting costs, yielding the expected optimal cost, 
£^[/*(Zo, TT^)], with respect to the uniform probability 
distribution rj over the initial states in S. Since we use the 



same hyperparameter in our Monte Carlo simulation 
for a given random PBN J\f^, we denote the expected 
optimal cost obtained from a OBR policy by /qbr- 

We take a similar approach for evaluating the perfor- 
mance of jjLAGR' Instead of using the method of successive 
approximations at every epoch, we use the current value 
of the hyperparameter to calculate the mean of Q and use 
this to find the optimal action to take at that hyperstate. 
Every other step of the process is essentially the same to 
those of the OBR policy. We denote the expected optimal 
cost obtained from this poUcy by /^gr- 

We also evaluate three other cost functions for each 
PBNAr^:/[B := £^ [£,[/|,(Zo)] ], /| := £,[/;,(Zo)], 

and J^j := Ejj^ |^£^[/^(Zo)] j , where jfp is the cost of 
applying an optimal intervention policy corresponding to 
to an element V of Q. The first cost function, /^g, 
is a lower bound on the performance of the OBR pol- 
icy, /g^. The second cost function, Jj, corresponds to the 
cost of applying an optimal intervention policy as if we 
knew the true network, V^, to the true network itself. The 
third cost function, J^j, is the expected cost, relative to 
the prior, of applying an intervention policy that is opti- 
mal for the true network. We can calculate these cost 
functions assuming that Q and the prior distribution 
are represented by the set {V^'^'} corresponding to each 
PBNA/'^ 

All cost functions discussed above are defined relative 
to a given random PBN J\f^. Since we have 500 such net- 
works, for each parameter value, we report the average 
performance across all random networks and provide a 
statistical comparison on the performance of different 
intervention policies. The results are presented in Table 1. 
As seen in the table, the optimal policy performance, in 
the average sense, is consistently better than all subop- 
timal policies. The closest performance to the optimal 
method is achieved by the AGR policy, which is not sur- 
prising, since this policy adapts to the process over time by 
updating the prior distribution to a posterior distribution 
and optimizes with respect to the mean of the posterior. 

As it has been reported in the previous studies [7,8,24], 
the performance of an optimal policy might not signifi- 
cantly exceed those of suboptimal policies when averaged 
across random PBNs; nonetheless, there are networks for 
which the optimal policy notably outperforms the sub- 
optimal ones. To demonstrate this, we use the difference 



Table 1 Average costs across all 500 randomly generated PBNs with n = 3 genes and e = 0.1 

EJAb^ EU'mcr^ ^[^br] 

K = 0.] 0.7626 1.0803 1.0998 1.0948 1.0991 1.0816 1.0812 

K = ].0 0.8078 1.0296 1.0531 1.0520 1.0526 1.0458 1.0457 

K = 5.0 0.9417 1.0209 1.0525 1.0518 1.0513 1.0502 1.0501 
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Figure 2 Empirical CCDF of for different intervention policies across randomly generated PBNs with three genes. (A) /c = 0.1 . (B) 
K = ].0. (C)k = 5.0. 



between the optimal and suboptimal costs to quantify the 
gain made by implementing an optimal policy. We define 
percent decrease by 

Ai = 100 X 

where and fl denote two different intervention poli- 
cies. Since PBNs are randomly generated, will also be 
a random variable with a probability distribution. We esti- 
mate the complementary cumulative distribution func- 
tion (CCDF) of this distribution for different values of 
using its empirical distribution function. 

For networks with three genes, we assume that li = 
^OBR is suboptimal policy. Figure 2 shows the 

empirical CCDF of A^ for 500 random PBNs for different 
values of k and different intervention policies. The graphs 
illustrate that as the prior distribution gets tighter around 
the true TPM by increasing /c, the difference between the 
optimal and suboptimal policies vanishes. Again, the best 
performance among the suboptimal policies is achieved 

As suggested by these results, we may use the subopti- 
mal AGR policy instead of an optimal method for larger 
networks without a significant lose of optimality. For this 



purpose, we carry out a similar set of simulations with 
500 randomly generated PBNs of size n = 4 genes. We 
assume that each PBN consists of m = 3 equally likely 
constituent BNs with the maximum number of predictors 
for each node set to 2, the total number of states being 
16. The network bias is drawn randomly from a beta dis- 
tribution with mean 0.5 and standard deviation 0.01. The 
gene perturbation probability is p = 0.001. We generate 
prior distributions using (15) for each network and differ- 
ent parameter values for k and £. To model we draw 
5,000 random TPMs from each prior distribution. We 
assume that each random sampling path has length 10 and 
set the discounting factor k to 0.2. We emulate different 
sampling paths during the calculation of /^^j^ by repeat- 
ing the entire process for each randomly generated TPM 
for 1, 000 iterations and take the average. The results aver- 
aged over 500 random PBNs are presented in Table 2. The 
AGR policy yields the best performance relative to other 
suboptimal policies. 

We graph the empirical CCDF of A^ for these networks 
in Figure 3 for different values of the pair (/c, s) 
and different suboptimal policies. Here, we have that 

= /^^j^ and /q are any other suboptimal policy. Similar 
to networks with three genes, as the prior distributions 
get more concentrated around the true parameters, the 



Table 2 Average costs across all 500 randomly generated PBNs with n 


= 4 genes 








E[/lb] 


E[J[] 


E[J',,] 








(k,s) = (0.1,0.0) 


0.7559 


1 .0878 


1 .0869 


1 .0856 


1 .0869 


1.0773 


(k,s) = (1.0,0.0) 


0.8702 


1 .0888 


1 .0888 


1.0918 


1 .0888 


1 .0854 


(k,s) = (5.0,0.0) 


0.9510 


1.0579 


1.0578 


1.0612 


1.0578 


1.0572 


(k,s) = (0.1,0.1) 


0.7711 


1.1099 


1.1260 


1.1248 


1.1258 


1.1156 


(k,s) = (1.0,0.1) 


0.8722 


1.1106 


1.1278 


1.1314 


1.1276 


1.1236 


(k,s) = (5.0,0.1) 


0.9714 


1 .0826 


1.1011 


1.1049 


1.1009 


1.1002 


(k,s) = (0.1,0.25) 


0.7177 


1 .0796 


1.1289 


1.1234 


1.1248 


1.1133 


(k,s) = (1.0,0.25) 


0.8307 


1.0853 


1.1348 


1.1325 


1.1305 


1.1257 


(k,s) = (5.0,0.25) 


0.9729 


1 .0629 


1.1178 


1.1157 


1.1137 


1.1130 
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Figure 3 Empirical CCDF of aJ, for different suboptimal intervention policies across randomly generated PBNs with four genes. (A) 

(k,6) = (0.1,0.0). (B) (k,s) = (1.0,0.0). (C) (k,s) = (5.0, 0.0). (D) (k,s) = (0.1,0.1). (E) (/c,e) = (1 .0,0.1). (F) (k,£) = (5.0, 0.1 ). (G) (k,£) = 
(0.1,0.25). (H) (k,£) = (1.0,0.25). (l)(/c,£) = (5.0,0.25). 



difference between these suboptimal policies gets smaller 
and smaller. However, it can be seen that the GR pol- 
icy outperforms MCR for larger k, which could be due to 
the fact that GR and AGR policies differ very little when 
the effect of observations on the posterior distribution is 
dominated by the prior hyperparameters. 

Real network 

We construct a PBN corresponding to a reduced network 
from a mutated mammalian cell cycle network proposed 



Table 3 Boolean regulatory functions of a mutated 
mammalian cell cycle 



Gene 


Node 


Predictor functions 


CycD 


V] 


Extracellular signal 


Rb 


V2 


(\77 A V4 A 1^5 A iTg) 


E2F 


V3 


(\72 A \75 A \7g) 


CycE 


V4 




CycA 


V5 


(\/3 AVjAVqA (V7 a Vs)) V (Vs A 
A A (\/7 A Vs)) 


Cdc20 


ve 


Vg 


Cdhl 


V7 


(V^ AV^)V Ve 


UbcHlO 


V8 


Vj V (Vy AVg A (Vq V \/5 V Vg)) 


CycB 


Vg 


(V^AVj) 



in [25]. The original GRN is a BN with ten genes. Three 
key genes in the model are Cyclin D (CycD), retinoblas- 
toma (Rb), and p27, where cell division is coordinated with 
the overall growth of the organism through extracellular 
signals controlling the activation of CycD in the cell. A 
proposed mutation for this network is that p27 can never 
be activated (always OFF), creating a situation where both 
CycD and Rb might be inactive [25]. Under these condi- 
tions, the cell can cycle in the absence of any growth factor, 
thereby causing undesirable proliferation. Table 3 lists the 
Boolean functions for this real network. 

Since the size of the network is too large for the Bayesian 
treatment, we need to first reduce the number of genes 
to a more manageable size while preserving important 

Table 4 Boolean regulatory functions of a reduced 
mutated mammalian cell cycle 



Gene Node Predictor functions 

CycD V] Extracellular signal 

Rb V2 (\77 A \/2 A iTJ A Vs) 

CycA ^3 (\72 A \/3 A 1^5 ) V (\72 A V 'iTs ) 

UbcHl 0 V4 (V4 A Vs) V (V3 A Vs) 

CycB Vs V3 AVs 



Yousefi and Dougherty EURASIP Journal on Bioinformatics and Systems Biology 2014, 2014:6 
http://bsb.eurasipjournals.eom/content/2014/1/6 



Page 12 of 13 



Table 5 Total discounted cost of different suboptlmal policies for the reduced cell cycle network 






Jj 


^ET 


-'mcr 


-'gr 


JjKGR 


{K,£) = (}J.\ , U.U; 


U./dU/ 


U.yooD 




U.y40D 




n 1 ^ 

u.y^ 1 0 


{k,s) = (1.0,0.0) 


0.4990 


0.9685 


0.9675 


0.9614 


0.9675 


0.9571 


{k,s) = (5.0,0.0) 


0.6136 


0.9685 


0.9658 


0.9774 


0.9658 


0.9605 


{k,£) = (0.1,0.1) 


0.4501 


0.9685 


0.9239 


0.9268 


0.9239 


0.9144 


(k,s) = (1.0,0.1) 


0.5752 


0.9685 


0.9340 


0.9526 


0.9340 


0.9294 


(k,s) = (5.0,0.1) 


0.7507 


0.9685 


0.9326 


0.9465 


0.9326 


0.9316 


(k,£) = (0.1,0.25) 


0.3885 


0.9685 


0.8643 


0.8674 


0.8623 


0.8550 


{k,£) = (1.0,0.25) 


0.5140 


0.9685 


0.8728 


0.8860 


0.8730 


0.8694 


{k,£) = (5.0,0.25) 


0.7014 


0.9685 


0.8864 


0.9002 


0.8864 


0.8861 



dynamical properties of the network. We have imple- 
mented the methodology proposed in [26] and reduced 
the size of the network to the five genes shown in Table 4. 
Even for a network of this size, finding the OBR policy is 
computationally too expensive. Therefore, we only report 
results for suboptimal policies. 

We first construct an instantaneously random PBN for 
the reduced network. The PBN consists of five genes, 
CycD, Rb, CycA, UbcHlO and CycB, ordered from the 
most significant bit to the least significant bit in the binary 
representation. In the mutated network, depending on the 
state of the extracellular signal determining the state of 
CycD as being ON or OFF, we obtain two BNs. These 
two will serve as two equally likely constituent BNs. It 
is also assumed that the gene perturbation probability is 
0.01. Since cell growth in the absence of growth factors 
is undesirable, we define undesirable states of the state 
space to be those for which CycD and Rb are both down- 
regulated. We also choose CycA as the control gene. The 
immediate cost function is defined similarly to that of 
the synthetic network simulations (Equation 14). The dis- 
counting factor is k = 0.2. We calculate the TPM of this 
network and construct prior hyperparameter matrices a 
using (15) for various pairs of k and s. We generate 10, 000 
random TPMs from the prior distribution to represent 
the uncertainty class Q. We also generate 10, 000 differ- 
ent sampling paths of length 10 for each random TPM. 
The total costs are reported in Table 5, where we can see 
that the results are consistent with those obtained from 
synthetic networks. 

Conclusions 

Due to the complex nature of Markovian genetic regula- 
tory networks, it is commonplace not to possess accurate 
knowledge of their parameters. Under the latter assump- 
tion, we have treated the system of interest as an uncer- 
tainty class of TPMs governed by a prior distribution. The 



goal is to find a robust intervention policy minimizing 
the expected infinite-horizon discounted cost relative to 
the prior distribution. We have taken a Bayesian approach 
and formulated the intervention policy optimizing this 
cost, thereby resulting in an intrinsically robust policy. 
Owing to extreme computational complexity, the result- 
ing OBR policy is, from a practical sense, infeasible. Using 
only a few genes, we have compared it to several sub- 
optimal polices on synthetically generated PBNs. In this 
case, although there are PBNs where the OBR policy sig- 
nificantly outperforms the suboptimal AGR policy, on 
average there is very little difference. Hence, one can feel 
somewhat comfortable using the AGR policy while los- 
ing only negligible performance. Unfortunately, even the 
AGR policy is computationally burdensome. Hence, when 
applying it to the mammalian cell cycle network, we are 
restricted to five genes. 

The twin issues of uncertainty and computational com- 
plexity are inherent to translational genomics. Here we 
have examined the problem in the context of therapy, 
where the uncertainty is relative to network structure. It 
occurs to also in the other major area of translational 
genomics, gene-based classification. Whereas here the 
prior distribution is over an uncertainty class of networks, 
in classification it is over an uncertainty class of feature- 
label distributions and one looks for a classifier that is 
optimal, on average, across that prior distribution [27,28] . 
There is no doubt, however, that the complexity issue 
is much graver in the case of dynamical intervention. 
Hence, much greater effort should be placed on gaining 
knowledge regarding biochemical pathways and thereby 
reducing the uncertainty when designing intervention 
strategies [29]. This means more attention should be paid 
to classical biological regulatory experiments and less 
reliance on blind data mining [30]. 
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